A Non-deterministic Tokeniser for Finite-State Parsing
نویسندگان
چکیده
This paper describes a non-deterministic tokeniser implemented and used for the development of a French finite-state grammar. The tokeniser includes a finite-state automaton for simple tokens and a lexical transducer that encodes a wide variety of multiword expressions, associated with multiple lexical descriptions when required.
منابع مشابه
Finite state tokenisation of an orthographical disjunctive agglutinative language: The verbal segment of Northern Sotho
Tokenisation is an important first pre-processing step required to adequately test finite-state morphological analysers. In agglutinative languages each morpheme is concatinatively added on to form a complete morphological structure. Disjunctive agglutinative languages like Northern Sotho write these morphemes, for certain morphological categories only, as separate words separated by spaces or ...
متن کاملAn Efficient Parallel Determinisation Algorithm for Finite-state Automata
Determinisation of non-deterministic finite automata (NFA) is an important operation not only for optimisation purposes, but also the prerequisite for the complementation operation, which in turn is necessary for creating robust pattern matchers, for example in string replacement and robust parsing. In the paper, we present an efficient parallel determinisation algorithm based on a message-pass...
متن کاملPreference-Driven Bimachine Compilation. An Application to TTS Text Normalisation
This paper describes a grammar formalism and a deterministic parser developed for text normalisation in the rVoice text-to-speech (TTS) system. The rules are formulated using regular expressions and converted into a non-deterministic finite-state transducer (FST). At runtime, search is guided by parsing preferences which the user may associate with regular operators; the best solution is determ...
متن کاملConstructing Finite State Automata for High Performance Web Services
This paper presents a new XML parsing method based on deterministic finite state automata (DFA). A DFA generator is described that automatically translates XML Schemas to DFAs for efficient parsing of XML documents and SOAP/XML messages. The DFA-based parsing approach supports the implementation of high-performance Web services. Two example case studies are described and performance results are...
متن کاملPartial parsing via finite-state cascades
Finite-state cascades represent an attractive architecture for parsing unrestricted text. Deterministic parsers specified by finite-state cascades are fast and reliable. They can be extended at modest cost to construct parse trees with finite feature structures. Finally, such deterministic parsers do not necessarily involve trading off accuracy against speed—they may in fact be more accurate th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996